Deterministic Indexing for Packed Strings
نویسندگان
چکیده
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic and packed setting. Given a packed string S of length n over an alphabet σ, we show how to preprocess S in O(n) (deterministic) time and space O(n) such that given a packed pattern string of length m we can support queries in (deterministic) time O (m/α+ logm+ log log σ) , where α = w/ log σ is the number of characters packed in a word of size w = Θ(logn). Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., log σ ≪ w, the query times are faster.
منابع مشابه
Probabilistic Threshold Indexing for Uncertain Strings
Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable...
متن کاملDeciding Indexing Strings with Statistical Analysis
Deciding indexing string is important for Information Retrieval. Ideally, the strings should be the words that represent the documents or query. Although each single word may be the first candidate of indexing strings for English corpus, it may not ideal due to the existence of compound nouns, which are often good indexing strings, and which depends on genre of corpus. The situation is even wor...
متن کاملA Generalized Approach for Image Indexing and Retrieval Based on 2-D Strings
2-D strings is one of a few representation structures originally designed for use in an IDB environment. In this paper, we propose a generalized approach for 2-D string based indexing which avoids the exhaustive search through the entire database of previous 2-D strings based techniques. The classical framework of representation of 2-D strings is also specialized to the cases of scaled and unsc...
متن کاملRaz-McKenzie simulation with the inner product gadget
In this note we show that the Raz-McKenzie simulation algorithm which lifts deterministic query lower bounds to deterministic communication lower bounds can be implemented for functions f composed with the Inner Product gadget 1ip(x, y) = ∑ i xiyi mod 2 of logarithmic size. In other words, given a function f : {0, 1}n → {0, 1} with deterministic query complexity D( f ), we show that the determi...
متن کاملGenerating Indexing Functions of Regularly Sparse Arrays for Array Compilers
There are many applications involving arrays that contain non-zero components in regular geometric partitions. These include triangular, diagonal, tridiagonal, banded, etc. When computing with this type of arrays, they are usually stored in a packed form and computations are performed with only the non-zero components. This packed form requires an indexing function that maps an index of the arr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017